19 research outputs found

    Large Scale Study of Ligand-Protein Relative Binding Free Energy Calculations: Actionable Predictions from Statistically Robust Protocols

    Get PDF
    The accurate and reliable prediction of protein-ligand binding affinities can play a central role in the drug discovery process as well as in personalized medicine. Of considerable importance during lead optimization are the alchemical free energy methods that furnish an estimation of relative binding free energies (RBFE) of similar molecules. Recent advances in these methods have increased their speed, accuracy, and precision. This is evident from the increasing number of retrospective as well as prospective studies employing them. However, such methods still have limited applicability in real-world scenarios due to a number of important yet unresolved issues. Here, we report the findings from a large data set comprising over 500 ligand transformations spanning over 300 ligands binding to a diverse set of 14 different protein targets which furnish statistically robust results on the accuracy, precision, and reproducibility of RBFE calculations. We use ensemble-based methods which are the only way to provide reliable uncertainty quantification given that the underlying molecular dynamics is chaotic. These are implemented using TIES (Thermodynamic Integration with Enhanced Sampling). Results achieve chemical accuracy in all cases. Ensemble simulations also furnish information on the statistical distributions of the free energy calculations which exhibit non-normal behavior. We find that the "enhanced sampling" method known as replica exchange with solute tempering degrades RBFE predictions. We also report definitively on numerous associated alchemical factors including the choice of ligand charge method, flexibility in ligand structure, and the size of the alchemical region including the number of atoms involved in transforming one ligand into another. Our findings provide a key set of recommendations that should be adopted for the reliable application of RBFE methods

    Comparison of Equilibrium and Nonequilibrium Approaches for Relative Binding Free Energy Predictions

    Get PDF
    Alchemical relative binding free energy calculations have recently found important applications in drug optimization. A series of congeneric compounds are generated from a preidentified lead compound, and their relative binding affinities to a protein are assessed in order to optimize candidate drugs. While methods based on equilibrium thermodynamics have been extensively studied, an approach based on nonequilibrium methods has recently been reported together with claims of its superiority. However, these claims pay insufficient attention to the basis and reliability of both methods. Here we report a comparative study of the two approaches across a large data set, comprising more than 500 ligand transformations spanning in excess of 300 ligands binding to a set of 14 diverse protein targets. Ensemble methods are essential to quantify the uncertainty in these calculations, not only for the reasons already established in the equilibrium approach but also to ensure that the nonequilibrium calculations reside within their domain of validity. If and only if ensemble methods are applied, we find that the nonequilibrium method can achieve accuracy and precision comparable to those of the equilibrium approach. Compared to the equilibrium method, the nonequilibrium approach can reduce computational costs but introduces higher computational complexity and longer wall clock times. There are, however, cases where the standard length of a nonequilibrium transition is not sufficient, necessitating a complete rerun of the entire set of transitions. This significantly increases the computational cost and proves to be highly inconvenient during large-scale applications. Our findings provide a key set of recommendations that should be adopted for the reliable implementation of nonequilibrium approaches to relative binding free energy calculations in ligand-protein systems

    Pattern formation in Passiflora incarnata: An activator-inhibitor model

    Get PDF
    Based on a careful examination of the onset of violet colored dots along the filaments in the developing floral bud stage and the formation of alternating bands of violet and white color in the matured flowers of Passiflora incarnata (Passion flower), it is concluded that the pattern arises from a competition between the production of violet colored anthocyanin and the colorless flavonols along the filaments. The activator-inhibitor model of Gierer and Meinhardt along with the reaction diffusion theory of Turing is used to explain the formation of concentric rings in the flower

    Alchemical Free Energy Estimators and Molecular Dynamics Engines: Accuracy, Precision, and Reproducibility

    Get PDF
    The binding free energy between a ligand and its target protein is an essential quantity to know at all stages of the drug discovery pipeline. Assessing this value computationally can offer insight into where efforts should be focused in the pursuit of effective therapeutics to treat a myriad of diseases. In this work, we examine the computation of alchemical relative binding free energies with an eye for assessing reproducibility across popular molecular dynamics packages and free energy estimators. The focus of this work is on 54 ligand transformations from a diverse set of protein targets: MCL1, PTP1B, TYK2, CDK2, and thrombin. These targets are studied with three popular molecular dynamics packages: OpenMM, NAMD2, and NAMD3 alpha. Trajectories collected with these packages are used to compare relative binding free energies calculated with thermodynamic integration and free energy perturbation methods. The resulting binding free energies show good agreement between molecular dynamics packages with an average mean unsigned error between them of 0.50 kcal/mol. The correlation between packages is very good, with the lowest Spearman's, Pearson's and Kendall's tau correlation coefficients being 0.92, 0.91, and 0.76, respectively. Agreement between thermodynamic integration and free energy perturbation is shown to be very good when using ensemble averaging

    Ensemble-Based Approaches Ensure Reliability and Reproducibility

    Get PDF
    It is increasingly widely recognized that ensemble-based approaches are required to achieve reliability, accuracy, and precision in molecular dynamics calculations. The purpose of the present article is to address a frequently raised question: what is the optimal way to perform ensemble simulation to calculate quantities of interest

    Long Time Scale Ensemble Methods in Molecular Dynamics: Ligand–Protein Interactions and Allostery in SARS-CoV-2 Targets

    Get PDF
    We subject a series of five protein-ligand systems which contain important SARS-CoV-2 targets, 3-chymotrypsin-like protease (3CLPro), papain-like protease, and adenosine ribose phosphatase, to long time scale and adaptive sampling molecular dynamics simulations. By performing ensembles of ten or twelve 10 μs simulations for each system, we accurately and reproducibly determine ligand binding sites, both crystallographically resolved and otherwise, thereby discovering binding sites that can be exploited for drug discovery. We also report robust, ensemble-based observation of conformational changes that occur at the main binding site of 3CLPro due to the presence of another ligand at an allosteric binding site explaining the underlying cascade of events responsible for its inhibitory effect. Using our simulations, we have discovered a novel allosteric mechanism of inhibition for a ligand known to bind only at the substrate binding site. Due to the chaotic nature of molecular dynamics trajectories, regardless of their temporal duration individual trajectories do not allow for accurate or reproducible elucidation of macroscopic expectation values. Unprecedentedly at this time scale, we compare the statistical distribution of protein-ligand contact frequencies for these ten/twelve 10 μs trajectories and find that over 90% of trajectories have significantly different contact frequency distributions. Furthermore, using a direct binding free energy calculation protocol, we determine the ligand binding free energies for each of the identified sites using long time scale simulations. The free energies differ by 0.77 to 7.26 kcal/mol across individual trajectories depending on the binding site and the system. We show that, although this is the standard way such quantities are currently reported at long time scale, individual simulations do not yield reliable free energies. Ensembles of independent trajectories are necessary to overcome the aleatoric uncertainty in order to obtain statistically meaningful and reproducible results. Finally, we compare the application of different free energy methods to these systems and discuss their advantages and disadvantages. Our findings here are generally applicable to all molecular dynamics based applications and not confined to the free energy methods used in this study

    Ensemble Simulations and Experimental Free Energy Distributions: Evaluation and Characterization of Isoxazole Amides as SMYD3 Inhibitors

    Get PDF
    Optimization of binding affinities for ligands to their target protein is a primary objective in rational drug discovery. Herein, we report on a collaborative study that evaluates various compounds designed to bind to the SET and MYND domain-containing protein 3 (SMYD3). SMYD3 is a histone methyltransferase and plays an important role in transcriptional regulation in cell proliferation, cell cycle, and human carcinogenesis. Experimental measurements using the scintillation proximity assay show that the distributions of binding free energies from a large number of independent measurements exhibit non-normal properties. We use ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and TIES (thermodynamic integration with enhanced sampling) protocols to predict the binding free energies and to provide a detailed chemical insight into the nature of ligand-protein binding. Our results show that the 1-trajectory ESMACS protocol works well for the set of ligands studied here. Although one unexplained outlier exists, we obtain excellent statistical ranking across the set of compounds from the ESMACS protocol and good agreement between calculations and experiments for the relative binding free energies from the TIES protocol. ESMACS and TIES are again found to be powerful protocols for the accurate comparison of the binding free energies

    The performance of ensemble-based free energy protocols in computing binding affinities to ROS1 kinase

    Get PDF
    Optimization of binding affinities for compounds to their target protein is a primary objective in drug discovery. Herein we report on a collaborative study that evaluates a set of compounds binding to ROS1 kinase. We use ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and TIES (thermodynamic integration with enhanced sampling) protocols to rank the binding free energies. The predicted binding free energies from ESMACS simulations show good correlations with experimental data for subsets of the compounds. Consistent binding free energy differences are generated for TIES and ESMACS. Although an unexplained overestimation exists, we obtain excellent statistical rankings across the set of compounds from the TIES protocol, with a Pearson correlation coefficient of 0.90 between calculated and experimental activities

    PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications

    Get PDF
    Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities

    Which corners to cut? Guidelines on choosing optimal settings to maximise sampling with limited computational resources

    No full text
    Despite the increasingly wide availability of computational resources, it is still challenging for researchers to perform comprehensive molecular dynamics (MD) simulations on an industrial scale. With ensemble approaches, great accuracy and precision are achievable at the cost of considerable computational efforts. There is a trade-off between accuracy, precision and the compute cost of performing ensembles. A question frequently raised is what is the most optimal way to perform MD based calculation of one or more properties of interest? Here, we use our findings from extensive ensemble molecular dynamics simulations of ligand-protein systems to underpin the recommendations we are making in cases where computational cost is an important consideration. We recommend performing a minimum of 5 replicas for the ESMACS-style protocol and 3 replicas (per λ window) for the TIES-like protocol, and the use of a stepwise procedure to reduce costs so as to minimise the loss of accuracy and precision of the results obtained
    corecore